A DNA pooling-based case-control study of myopia candidate genes COL11A1, COL18A1, FBN1, and PLOD1 in a Chinese population.

Purpose We examined the relationship between high myopia and common polymorphisms in four candidate genes: collagen, type XI, alpha 1 (COL11A1); collagen, type XVIII, alpha 1 (COL18A1); fibrillin 1 (FBN1); and procollagen-lysine 1,2-oxoglutarate 5-dioxygenase 1 (PLOD1). These genes were selected because rare pathogenic mutations in these genes cause disease syndromes that have myopia, usually high myopia, as one of the common presenting features. Methods This study recruited 600 unrelated Han Chinese subjects including 300 cases with high myopia (spherical equivalent or SE≤-8.00 diopters) and 300 controls (SE within ±1.00 diopter). A total of 66 tag single nucleotide polymorphisms (SNPs) were selected for study from these four candidate genes. The study adopted a DNA pooling strategy with an initial screen of DNA pools to identify putatively positive SNPs and then confirmed the “positive” SNPs by genotyping individual samples forming the original DNA pools. DNA pools were each constructed by mixing equal amounts of DNA from 50 individuals with the same phenotype status. Six case pools were prepared from 300 cases and six control pools from 300 controls. Allele frequencies of DNA pools were estimated by analyzing the primer-extended products with denaturing high performance liquid chromatography and compared between case pools and control pools with nested ANOVA. Results In the first stage, 60 SNPs from the 4 candidate genes were successfully screened using the DNA pooling approach. Of these, 6 SNPs showed a statistical significant difference in estimated allele frequencies between case pools and controls at p<0.10. In the second stage, these “positive” SNPs were followed up by individual genotyping, but failed to be confirmed via standard single-marker and haplotype analyses. Conclusions Common polymorphisms in these four candidate genes (COL11A1, COL18A1, FBN1 and PLOD1) were unlikely to play important roles in the genetic susceptibility to high myopia.

Myopia is the commonest ocular disorder in the world. In general, it is more prevalent in Oriental populations (60%-80%) than in Caucasian populations (10%-25%) [1]. Subjects with high myopia, usually defined as −6.0 diopters (D) or worse, are more vulnerable to ocular pathologies later in their life, such as cataract, glaucoma and retinal detachment [2]. Myopia is a complex disease with contribution from genetic factors, environmental factors and their interactions [3,4]. Genetic association studies are usually used to identify myopia susceptibility genes, which tend to have small effect size [4,5]. Genome-wide association studies generate hypotheses for subsequent follow-up and are the method of choice, but are still beyond the reach of many research groups in terms of the cost. Another popular approach is to examine candidate genes, which are usually selected on the basis of their biology and function [4,6]. Genes underlying heritable Correspondence to: Prof. Shea Ping Yip, Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China; Phone: +852 3400 8571; FAX: +852 2362 4365; email: shea.ping.yip@inet.polyu.edu.hk disease syndromes with myopia as one of the common presenting features can be selected as myopia candidate genes for study [4,6].
Stickler syndrome is an autosomal dominant disease affecting types II and XI collagen expressed in vitreous and cartilage, and has highly variable clinical features affecting the eye, the ear and joints [7,8]. In particular, Stickler syndrome types 1 and 2 (STL1 and STL2) have myopia and abnormal vitreous while type 3 is a non-ocular form of the syndrome. STL1 is caused by mutations in the collagen, type II, alpha 1 (COL2A1) gene while STL2 arises from mutations in the collagen, type XI, alpha 1 (COL11A1) gene. Interestingly, COL11A1 mutations are also known to cause Marshall syndrome or Marshall/Stickler syndrome, which both have myopia as a common feature [7,8]. Knobloch syndrome is an autosomal recessive disease characterized by high myopia, vitreoretinal degeneration and occipital encephalocele, and is caused by mutations in the collagen, type XVIII, alpha 1 (COL18A1) locus [8][9][10]. Marfan syndrome is an autosomal dominant disorder of connective tissue with major manifestations affecting the ocular, skeletal and cardiovascular systems [11,12]. The major ocular abnormalities are lens dislocation and high myopia due to increased axial length. Marfan syndrome is classically caused by mutations in the fibrillin 1 (FBN1) gene. Ehlers-Danlos syndrome is a heterogeneous group of genetic disorders with major clinical features of skin hyper-extensibility, atrophic scarring, join hyper-mobility and generalized tissue fragility [11]. Type VI or the kyphoscoliotic form of Ehlers-Danlos syndrome is autosomal recessive in nature with added clinical features of kyphoscoliosis (a form of curved spine) and scleral fragility [11,12]. High myopia is also a common feature [13]. The kyphoscoliotic form of Ehlers-Danlos syndrome is caused by mutations in the PLOD1 gene, which encodes the enzyme procollagen-lysine 1,2-oxoglutarate 5 dioxygenase-1 (PLOD1; also known as lysyl hydroxylase 1) responsible for forming cross-links in collagens via hydroxylysine-based pyridinoline.
The genes responsible for these syndromes are expressed in various parts of the eye. These disease syndromes are caused by rare pathogenic loss-of-function mutations that are not found in healthy individuals. The mechanisms leading to the common occurrence of myopia, usually high myopia, in these syndromes are not well established. We hypothesized that common polymorphisms in these genes could be predisposing genetic factors for high myopia [4,6]. Indeed, common polymorphisms in COL2A1 -the causative gene for STL1 -have been found to be associated with myopia in two family-based association studies [14,15]. In this study, we evaluated COL11A1, COL18A1, FBN1, and PLOD1 (Table 1) as candidate genes for high myopia in a Chinese population with a case-control study approach.
We performed the study with an initial screen of DNA pools to identify putatively positive single nucleotide polymorphisms (SNPs) and then confirmed the "positive" SNPs by genotyping individual samples forming the original DNA pools. The initial screen of DNA pools was to cut down the time and cost involved in sample-by-sample genotyping [4,16]. DNA pools were constructed by mixing equal amounts of DNA from many subjects with the same disease status. In the present study, "case pools" were prepared from individuals with high myopia (cases) and "control pools" from emmetropes (controls). Allele frequencies of DNA pools were estimated by analyzing the primer-extended products with denaturing high performance liquid chromatography (DHPLC) [17] and compared between case pools and control pools with a proper statistical method, nested ANOVA (ANOVA) [18].

Subjects:
This study recruited 600 unrelated Han Chinese individuals including 300 cases with high myopia (spherical equivalent or SE ≤-8.00 D for both eyes) and 300 control subjects (SE within ±1.00 D for both eyes). The study was approved by the Human Subjects Ethics Subcommittee of the Hong Kong Polytechnic University and adhered to the tenets of the Declaration of Helsinki. Written informed consents were obtained from all participating subjects. Eye examination for all participants was conducted in the Optometry Clinic of the University, blood samples were collected and DNA was extracted as has been described previously [19]. Of particular relevance to this study was the exclusion of subjects who showed obvious signs of ocular disease or other inherited disease associated with myopia (e.g., Stickler syndrome, Marshall syndrome, Knobloch syndrome, Marfan syndrome, Ehlers Danlos syndrome, etc). Construction of DNA pools: A PicoGreen method (Invitrogen, Carlsbad, CA) was used to quantify accurately all DNA samples in accordance with the manufacturer's protocols. The DNA samples were then diluted to 5.0±0.3 ng/μl, and then mixed in equal volumes to construct DNA pools. DNA from 50 distinct individuals sharing the same phenotype was mixed to construct a single pool. In total, six case pools were constructed from 300 cases, and six control pools from 300 controls.
Tag SNP selection: Four candidate genes were investigated in this study: COL11A1, COL18A1, FBN1, and PLOD1 (Table  1). With the Tagger program [20], the following criteria were used to select tag SNPs from each of the gene of interest and its adjoining genomic region (3 kb upstream and 3 kb downstream): pairwise tagging algorithm, r 2 ≥0.8 and minor allele frequency (MAF) ≥0.10. The Han Chinese genotype data from the International HapMap Project database (release 23a, phase II) were used for tag SNP selection. In total, 66 tag SNPs were selected from these four candidate genes and screened by the DNA pooling strategy ( Table 1). Estimation of allele frequencies in DNA pools: Genomic DNA (individual or pooled) was amplified for each SNP with a touchdown protocol in a 15-μl reaction mixture, which contained 0.1 or 0.3 μM of each primer, 1.5 or 2.5 mM  *SNPs are arranged down the column in the order of 5′>3′ along the sense strand of the respective gene. †These six SNPs failed to be analyzed by the DNA pooling approach because of failure in PCR (rs1241209 in COL11A1, and rs2838927 in COL18A1) or PE reaction (rs8131523, rs9977482 and rs3818661 in COL18A1, and rs16961220 in FBN1) even after repeated optimization. This is the reason why specific conditions for PCR and/or PE are left blank (indicated as "-"). ‡In addition to ddGTP and ddATP, dTTP was also used in the primer extension reaction for rs2615987 (COL11A1) for an optimal discrimination of the extended products.
MgCl2 ( Primer extension (PE) reaction was performed in a 25μl reaction mixture, which contained 10 μl of purified PCR product, 1.5 μM of a specific PE primer (Table 2), 50 μM of each appropriate ddNTP ( Table 2) and 1 unit of Therminator (New England Biolabs, Beverly, MA) in a 1× reaction buffer provided by the manufacturer. Amplification was conducted as follows: initial denaturation of 1 min/96 °C, followed by 55 cycles of 10 s/96 °C, 15 s/43 °C and 1 min/60 °C. The WAVE Nucleic Acid Fragment Analysis System (Transgenomic, Omaha, NE) was used for DHPLC analysis of primer extended products. PE products were analyzed as described previously [21] with the following modifications: a 6% linear gradient change of the working elution buffer over a 3-min period and a different starting concentration of buffer B, which varied with the SNP being studied ( Table 2).
Estimation of the relative allele frequencies in DNA pools was based on the peak heights of the PE products as analyzed by DHPLC. Each DNA pool was analyzed in three replicates, and each replicate consisted of a single PCR followed by a single PE reaction and a single DHPLC analysis. Therefore, each SNP had 36 sets of readings for 6 case pools and 6 control pools. For each SNP, a heterozygous sample was first identified by screening 10 to 40 subjects, and then analyzed in three independent runs to obtain a mean value for the socalled "k correction factor" that was used to correct for differential incorporation of ddNTPs in PE reactions as described previously [17].
Individual genotyping: The positive findings (6 SNPs) in the initial screen of DNA pools were confirmed by genotyping the individual samples that formed the original DNA pools. The MassARRAY iPLEX Gold assay was used to genotype the samples in accordance to the manufacturer's protocols for 5 SNPs (Table 3). Because of the multiplexing format of the MassARRAY system, these SNPs were grouped and genotyped together with SNPs of other studies by a local service provider. One SNP (rs2838922) could not be grouped together with other SNPs for the MassARRAY system, and was genotyped by the method of restriction fragment length polymorphism ( Table 3). The fragment was amplified using touchdown PCR as described above with the following specific conditions: 0.1 μM of each primer, 1.5 mM MgCl2, 64 °C as the initial annealing temperature and 58 °C as the final target annealing temperature. Overnight digestion of the PCR products by TaqI (Fermentas, Vilnius, Lithuania) at 65 °C was performed according to the manufacturer's instructions. Digested products were separated by electrophoresis in polyacrylamide gels. Statistical analysis: Ocular data were analyzed with the STATA package (version 8.2; StataCorp, College Station, TX). Subjects were classified as cases (affected with high myopia) or controls (unaffected). For a given SNP, the relative allele frequencies were estimated from the peak heights of the two extension products and adjusted using the k correction factor according to the method reported by Hoogendoorn et al. [17]. With the STATA package, nested ANOVA [18] was used to compare the relative allele frequencies of the case pools and the control pools. A p value ≤0.10 for the comparison between case pools and control pools was used as the threshold for following up SNPs with individual genotyping. Genotype data of individual samples were tested for Hardy-Weinberg equilibrium (HWE), and compared between cases and controls for association. The PLINK package (version 1.07) [22] was used for analysis. The linkage disequilibrium measures were calculated and plotted using Haploview (version 4.2) [23]. Haplotype blocks were constructed using the algorithm known as the solid spine of LD, which is unique to Haploview. Potential interactions among SNPs were examined using the method of multifactor dimensionality reduction (MDR) [24].

Analysis of the ocular data:
The characteristics of the participating subjects have been reported in one of our previous studies [19]. They are briefly summarized as follows. The average SE was −10.53 (range: −24.00 to −8.00) D for cases, and 0.03 (range: −1.00 to 0.88) D for controls. The average axial length was 27.76 (range: 24.62 -31.29) mm for cases, and 23.85 (range: 21.24 to 27.71) mm for controls. These ocular data are for the right eyes. The average age was 27.7 (range: 15 to 48) years for cases, and 24.9 (range: 17 to 46) years for controls. There were more male subjects in the control group than in the case group (43.7% vs 28.3%, p=4.30×10 −5 ). Analysis of results for DNA pools: The results are summarized in Table 4. Of the 66 tag SNPs selected for study, 6 did not give any results because of failure in PCR or PE even after repeated optimization as noted in a footnote of Table 2. For the 60 SNPs successfully analyzed, the k correction factor ranged from 0.29 to 1.56 with a mean of 1.02; it ranged from 0.83 to 1.21 for 54 SNPs (90% of the SNPs analyzed). The estimated frequencies of the first eluted alleles ranged from 0.1041 to 0.9246 for case pools, and from 0.0929 to 0.9516 for control pools. The difference (case pools -control pools) in estimated allele frequencies varied from −0.0510 to 0.0337. At a lenient threshold of p≤0.10, six SNPs gave significant results, which were followed up with individual genotyping for confirmation. These included one SNP in the COL11A1 gene: rs17127311 (difference=-0.0325, p=0.0981). The other five "positive" SNPs were in the COL18A1 gene: rs2838922 (difference=0.0367, p=0.0572), rs11911327 (difference=-0.0315, p=0.0959), rs2236454 (difference=-0.0510, p=0.0629), rs2236457 (difference=0.0255, p=0.0779), and rs2236475 (difference=-0.0367, p=0.0852). No significant difference in allele frequencies was demonstrated in the remaining 55 SNPs, which were thus not tested any further.

Confirmation of pooled DNA results by individual genotyping:
The genotypes of these six SNPs were in HWE except two SNPs (rs2838922 and rs223475) in cases and two SNPs (rs17127311 and rs2838922) in controls (Table 5). Deviation from HWE in cases can be a signal for SNP-disease association [25]. The two SNPs violating HWE in controls (rs17127311 and rs2838922) were dropped from subsequent analysis. One haplotype block was constructed for three SNPs as shown in Figure 1. All four SNPs were analyzed for association with high myopia with adjustment for gender, but did not show significant differences in allele frequencies between cases and controls ( Table 5). Sliding window-based haplotype analysis of these four SNPs (rs11911327 [S1], rs2236454 [S2], rs2236457 [S3], and rs2236475 [S4] in the 5′>3′ order along the sense strand of the COL18A1 gene) did not show any association with high myopia either. The p values for the omnibus tests of haplotypes adjusted for gender were as follows: 0.2000 (S1-S2), 0.2850 (S2-S3), 0.1860 (S3-S4), 0.2560 (S1-S2-S3), 0.1840 (S2-S3-S4), and 0.2810 (S1-S2-S3-S4). MDR did not show any significant interaction among the SNPs either, with the best model consisting of all four SNPs (p=0.1719).

DISCUSSION
This study explored the relationship of common polymorphisms in four candidate genes (COL11A1, COL18A1, FBN1, and PLOD1) with high myopia in a Han Chinese population. Rare pathogenic mutations in these candidate genes cause disease syndromes that have myopia, usually high myopia, as one of the common presenting features [8][9][10][11][12][13]. It was logical to investigate whether common polymorphisms of these genes would be predisposing genetic factors for high myopia. This argument was strengthened by the positive association between common polymorphisms of the COL2A1 gene, the causative gene underlying STL1, and myopia [14,15]. The relationship between the selected candidate genes and high myopia has not been studied before. With this background, we examined these candidate gene with an efficient approach based on the initial screening of DNA pools. Six case pools and six control pools were constructed and screened by DHPLC analysis of primer extended products. Our case subjects were recruited with a refractive error (SE) threshold of at least −8.00 D for both eyes, and the average SE was −10.53 D. Such a high threshold was adopted for case subject recruitment so as to enrich the contribution of genetic factors to the extreme phenotype and to enhance the homogeneity of the case phenotype [4,6]. Our control subjects were emmetropic and were not randomly recruited from the general Chinese population in Hong Kong. This would also enhance the difference in the genetic components contributing to the phenotype difference between our cases and controls. Random population-based controls were less desirable because they would be enriched with subjects with mild to moderate myopia from our population -a population with a high prevalence of myopia [26]. These strategies would enhance the power of our study.
In the first stage, 60 SNPs from the 4 candidate genes were successfully screened using the DNA pooling approach. Of these, 6 SNPs gave a p value of less than 0.10 for the statistical comparison of allele frequency differences between case pools and control pools by nested ANOVA (Table 4). A *The major allele is designated as "1" and minor allele as "2"; and the genotype counts are indicated as the counts of the genotypes 11, 12, and 22, respectively. This study had 300 cases and 300 controls. Note that the total genotype counts may not add up to these expected numbers because a few samples failed to be genotyped in a random fashion. †The allelic test is performed with the gender as a covariate to adjust for the potential confounding by gender because the cases and controls differ significantly in the proportions of male and female subjects (p=4.30×10 -5 ).The odds ratio (OR) is calculated for the minor allele (allele 2) with the major allele (allele 1) as the reference. ‡ Association tests are not performed for these two SNPs (rs17127311 and rs2838922) because the genotypes in the controls are not in Hardy-Weinberg equilibrium (HWE).

1.
Single nucleotide polymorphisms (SNPs) and their linkage disequilibrium (LD) for the COL18A1 gene. The SNPs are indicated from the 5′ end (left) to the 3′ end (right) of the gene. The LD measures are expressed as D' and r 2 for all subjects under study (cases and controls combined), and are calculated by Haploview. The shades of red (for D') and gray (for r 2 ) represent the magnitude of the measures with deep red equal to 100% (or 1.00), which is omitted in the diagram to avoid cluttering. lenient significance threshold of p≤0.10 was used to avoid missing potentially significant SNPs. In the second stage, these "positive" SNPs were followed up by individual genotyping, but failed to be confirmed via standard singlemarker (Table 5) and haplotype analyses. In conclusion, common polymorphisms in these four candidate genes (COL11A1, COL18A1, FBN1, and PLOD1) were unlikely to play important roles in the genetic susceptibility to high myopia.
It is interesting to note that deviations from HWE were observed in the control group for two SNPs (rs17127311 and rs2838922; Table 5). The Hardy-Weinberg principle assumes a very large population in which mating is random and there are no migration, mutation and natural selection [27]. Theoretically, violations of these assumptions can result in deviations from HWE. However, deviations from HWE can indicate the presence of genotyping errors [28]. We had been very careful in carrying out the genotyping and calling the genotypes, and we confirmed any ambiguous genotypes by direct DNA sequencing. However, we cannot entirely rule out the possibility of genotyping errors as a cause for deviations from HWE. It is generally recommended not to perform casecontrol comparison for such genotype data to avoid false positive association results [28].
We used nested ANOVA [18] to test for differences in estimated allele frequencies between case pools and control pools. Nested ANOVA can properly handle the variance components of the errors arising from sampling of the subjects in forming the pools and of the technical errors arising from various stages of allele frequency estimation, e.g., unequal amounts of individual DNA samples in forming the pools, errors in PCR and primer extension reaction, and in DHPLC analysis. However, the individual variance components could not be estimated directly. Thus, it was not possible to calculate the power of our DNA pooling based-approach, which would expectedly be less than the power of an approach based on genotyping of all individual samples for all tag SNPs. In addition, our DNA pooling strategy did not allow haplotype analysis for SNPs only examined for the DNA pools [16]. On the other hand, we enhanced the power of our study by using stringent criteria for recruiting the cases (extreme phenotype) and the controls (supernormal), as has been discussed above [4,6,26]. DNA pooling has been proven to be effective as an initial screen of SNPs to search for putative genetic markers associated with a phenotype of interest for subsequent followup studies based on conventional genotyping of individual samples [16,29]. Saving in the amounts of DNA used and in the cost and time involved in genotyping is the major advantage of DNA pooling. For each SNP, 36 separate PCRs and the following analyses were needed for 6 case pools and 6 control pools together with 3 other separate PCRs for a heterozygous sample to determine the k correction factor. In comparison to genotyping 600 samples individually, our current DNA pooling approach could theoretically reduce the amounts of DNA used and the genotyping work by up to 93.5%. Note that we mixed DNA from 50 distinct individuals to form one DNA pool. This approach of using more pools of smaller size has been shown to be superior to the use of fewer DNA pools each formed from a larger number of subjects for genetic association studies of candidate genes [30].
It may seem that DNA pooling may become less attractive with the recent tremendous reduction in the unit cost of genotyping for high-throughput array-based wholegenome genotyping assays. However, the total cost of such genome-wide genotyping is still too expensive and hence unaffordable for most research groups. Use of DNA pools for genome-wide genotyping is one of the solutions proposed [31]. Recent studies even suggest the use of DNA pools for deep sequencing using next-generation sequencing technologies to explore the role of rare variants in complex diseases [32,33] because both common and rare variants are believed to contribute to the genetic susceptibility to complex diseases [34]. This new development is particularly important because deep sequencing is even more expensive and produces even larger amounts of data for analysis than genome-wide genotyping. Note that the present study did not address the potential role of rare variants that may contribute to the genetic susceptibility to high myopia, but do not cause the respective Mendelian disease syndromes mentioned above. Indeed, there are already rare variants reported to be associated with high myopia but not congenital stationary night blindness ( [35] and unpublished data). Congenital stationary night blindness is an X-linked monogenic ocular disease with high myopia as one of its common presenting features [36].
In summary, we examined using a DNA pooling approach tag SNPs from four candidate genes (COL11A1, COL18A1, FBN1, and PLOD1) selected because pathogenic mutations in these genes cause disease syndromes that have myopia, usually high myopia, as one of the common presenting clinical features. Six SNPs were followed up by individual genotyping, but did not demonstrate any association with high myopia. We concluded that common polymorphisms in these candidate genes were unlikely to be important in the genetic susceptibility to high myopia in Han Chinese.